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Article history: A new method for recognizing automatically Arabic handwritten words was 
. presented using convolutional neural network architecture. The proposed 
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type of neural network based on multilayer principle; our method needs a 
big dataset of word images to obtain the best result. To optimize our system, 
Keywords: a new database was collected from the benchmarking Arabic handwriting 
database using the pre-processing such as rotation transformation, which is 
applied on the images of the database to create new images with different 
features. The convolutional neural network applied on our database that 
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Handwriting analysis contains 40,320 of Arabic handwritten words (26,880 images for training set 

Handwritten Arabic word and 13,440 for test set). Thus, different configurations on a public 

recognition benchmark database were evaluated and compared with previous methods. 
Consequently, it is demonstrated a recognition rate with a success of 
96.76%. 
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1. INTRODUCTION 

The system of Arabic handwritten recognition makes easier the transformation of Arabic 
handwriting into its symbolic representation. There is several system of Arabic handwritten recognition: the 
recognition of handwritten text, words and characters. Our method focus on Arabic handwritten words 
recognition which are recognized by two approaches, the analytical approaches and the global approaches. 
This later addresses word recognition recognizes each letter that composes the word, but the letter 
segmentation is a difficult operation. Therefore, it is proposed the global approach which recognizes the 
words as a whole without trying to locate each of the letters that compose it. The global approach is basic of 
our investigation. 

On the other side, neural networks are computing system allows recognizing, and contain three 
layers: input layer, hidden layers and output layer. The deep neural network has made it possible to make 
great progress in several recognition problems in scientific research, such as the detection of objects (for 
example, [1]-[4]), Arabic handwritten characters [5]. Deep neural networks using several hidden layers, 
hence it needs a large number of connection parameters and needs very large images of the database. In our 
work, the convolutional neural network (CNN) was used with a small number of parameters and easy for 
training, using the Arabic handwriting database (AHDB). Moreover, CNN has the ability to learn from very 
large number of complicated inputs (images or sounds), nonlinear mappings [6], [7]. The use of the same 
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filter and weight for each input of convolutional layer is an advantage of CNN, to decrease the number of 
paramters and increase the performance [8]. 

In several investigation in the literature, researchers used several classifiers in order to recognize the 
Arabic handwritten words or the characters such as a methods of hidden markov models (HMM) [9]-[14], K- 
nearest-neighbors (KNN) [15], [16], support vector machine (SVM) [17], neural networks [18]-[20], [5], 
[21], [22]. Otherwise the method of classification selected still owned weak points [23]. For instance, a huge 
computation to compute kernels is necessary in SVM method [24], also, the extreme learning machine 
(ELM) performing unstably because of the random weights among the input and hidden layers [25]. 
Likewise, the multilayer perceptron (MLP) is based on back-propagation that is decelerate training [26]. 
Ghadhban et al. [23] proposed the incorporation of a good classifiers with easier computation and can obtain 
strong performance of recognition of Arabic handwriting [23]. Despite of research works in the literature 
made by researchers to increase the performance of Arabic handwriting recognition methods, the field still 
confronts problems related to computation time and result. Rabi et al. [27], the HMM based reference 
method was enhanced by the use of hybrid HMM/MLP, and hidden Markov models to extract the statistical 
and geometrically features. While in a recent paper, authors proposed method for recognizing Arabic 
handwritten word without segment them into sub letters merging the scale invariant feature transform (SIFT) 
as feature extractor and SVMs as classifer [28]. Likewise, in our previous work [29], we have proposed an 
amethod for recogning Arabic handwritten text using an integration of n-gram model. 

The system of recognition of Arabic handwritten text needs the text segmentation into text lines and 
lines into letters or words for recognizing them, the segmentation and recognition of characters is difficult 
operation, since the variations in writing style, and the linking of letters between them. Therefore, the 
proposed method used the global approach which does not need segmentation of characters, by convolutional 
neural network. In related works, the algorithms presented applied on a small database of Arabic handwritten 
words [30]-[33] (FN/ENIT, AHDB...), and it makes a problem in recognizing all Arabic handwritten words 
images. That leads us to create a new database of Arabic handwritten words by modification of pre- 
processing of word images (such as rotation transformation) to better the yield of the results of words 
handwritten Arabic recognition. 


2. METHOD 
2.1. Motivation 

To increase the performance of Arabic handwritten words recognition, we use several knowledge of 
the research work. In recent years the variations styles of Arabic handwritten words, making it interested to 
work on and propose a new method solving the problems of Arabic handwritten recognition. The 
segmentation and recognition of letters is difficult operation, since the style of handwriting is varied, and the 
letters linked for each other, it’s sufficient to recognize the whole words without characters segmentation, 
using convolutional neural network. To be able to obtain good decisions on a deep learning system, we need 
a big data (images). All databases of Arabic handwriting words in the literature didn’t contain a huge number 
of images to obtain a good result, we proposed a method that makes images from AHDB using preprocessing 
of images (such as rotation). That helps our method more performance. 

There are several authors used CNN for the stage of the extraction of features of the images [30], 
[34]. There are others which combines CNN and other classifiers (SVM and HMM) to classify the 
handwritten Arabic words [30], [34]. In our work, CNN used for extracting the feature and classification 
steps of Arabic handwritten words recognition. 


2.2. Architecture 

Usually, Arabic handwriting word recognition system apply a few preprocessing steps on the input 
images, to increase the performance of recognition. Moreover, for the system of recognition based on CNN, 
the pre-processing step is not necessary to apply a several operations on the input images, to reduce the 
variability handwriting. In our proposed system, the operations using in the pre-processing step are: 
binarization, normalization and transformation of rotation. The size of the input image is 100x100 after 
normalisation. After the pre-processing step, which prepare the input data of CNN, the input data x1; X2; .... 
Xn are word images. We use three layers type (convolutional layers (CONVL)> pooling layers 
(POOLL)> fully connected layers (FCL)), M x M x H images are the input data of CONVL (M is the height 
and width of the input image, H is the number of channels), the number of pixels in the each image is MxM. 
In our system we use gray scale images with H=1 (one channel) but for RGB image, we use three channels 
H=3 as shown in Figure 1. 
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Figure 1. Architecture of proposed method 


The architecture of our CNNstart by the first layer is CONVL consisting of 8 feature maps 
calculated by overlapping NXN kernel on the input M x M raw gray scale image. With N=5 and M=100. 
Then we apply the nonlinear activation function using rectified linear units (ReLU), which consist to choose 
the max between the pixel of the image and the value 0. For each feature map we apply POOLL using max- 
poolingwhich aims to extract the maximum of pixels using non-overlapping kernel 2x2. Lastly, the FCLs 
apply on the output of the CONVL and POOLL, as in a norm convolutional neural network system. 


2.3. Preprocessing 
2.3.1. Binarization 

If the images of the database are in grayscale or color, the binarisation operation aims to extract the 
background of the word for transferring into the binary images: pixek=1 in the background, and pixel=0 in 
the textor reciprocal. The global thresholding used for calculating one threshold to the whole image. The 
pixels that are above the threshold affect O and others affect the value 1 [34]. 


2.3.2. Normalization 

In the system of CNN, the database images should be the same size. It is known, the operation of 
normalization make the images in the same forms related to size [29]. In the present investigation, the size 
100x100 was selected and presented in the same size. 


2.3.3. Rotation 

Convolutional neural networks need a big training data, in the literature, there is not a database of 
Arabic handwritten words sufficient for our system. We proposed a method to create images from the 
existing database, we proposed to modify the images to change their characteristics, and the method used is a 
transformation of rotation. Given a point in the image, its new coordinates after the transformation of rotation 
of the whole image around its origin by the angle Os as shown in (1). 


G) z 6 ane a @ (1) 


2.4. Convolutional neural network layers 
2.4.1. Convolutional layer 

The input of this layer is images X1, X2, ..., Xn. The form of the input data of CONVL is M x Mx H 
image (M is the height and width of the input image as shown Figure 2, H is is the channels number per 
pixel), the number of pixels of the input image equal M x M and Hequal one channel for binary image, three 
channels for RGB image. N x N xF is the size of K filters(kernels) used in the CONVL(N is the height and 
width of filters (kernels) and F is the same number of channels image H varied for each filter (kernel) Figure 
2(a). The size ofK feature maps is M-N+1 makes when the filter convolved with the image shown in Figure 
2(b). The goal of convolutional layer is extracting salient features of the inputs images. 

In our proposed approach, we used the activation function rectified linear units (ReLU) that applies 
the output of convolutional layer. In order to affect the max with the pixel and the value 0 to replace the 
negative pixels by the value 0. The activation function use is ReLU non-linearity applied to each output of 
CONVL and FCL. The ReLU [35] aims to increase the nonlinear properties of the global network without 
transferring the receptive elements of the convolution layer. 
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Figure 2. Convolutional layer: (a) example of convolutional operation and (b) example of single 
convolutional layer 


2.4.2. Pooling layer 

After each convolutional layer, we apply pooling layer on the output data. This layer used to 
decrease the dimensions of the feature maps. We pooled with max or average pooling with size q x q for each 
feature where q comprised between 2 and 5 for large inputs. Pooling layers aims to reduce the size of the 
feature maps. There are several types of pooling ( Max and Average), we use max-pooling in our approach 
with size 2x2 which consist to select the maximum pixel from the block of the feature map of the 
convolutional layer output. The feature map containing the most important features of the previous feature 
map is the output of max-pooling Figure 3. 


ii a 
Feature map POOL Layer 


M-N+1xM-N41 “x 


Figure 3. Pooling layer 


2.4.3. Fully connected layer 

We apply fully connected layer after many convolutional and max-pooling layers, which allows 
using the results of the convolution/pooling process to classify the image into a label. This layer aims to 
connect all neurons of the precedent layers with each unique neuron it has, and converts it into a single vector 
of values, every neuron of the output layer represents a classification label which contains a probability that a 
certain feature belongs to a label. Finally, we apply the softmax function on thenetwork output to compute a 
probability value for each class. 

The architecture of our convolutional network of Arabic handwritten words recognition presented as 
following: INPUT> CONVL~> ReLU > POOLL> CONVL~> ReLU > POOLL® CONVLReLU > 
POOLL> FCL. The first layer is C1 convolutional layer consisting of 8 feature maps calculated by 
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overlapping 5x5 filter on the input 100x100 raw binary image, the output size of the C1 is ((100-5)+1=96), 
there are 8 feature maps of 96x96. P2 is the max-pooling layer which applies on the output of the C1 layer 
non-overlapping 2x2 kernel, the size of the output is (96/2=48) with 8 feature maps of the same size (48x48). 
At layer C3 second convolutional layer, we have 16 feature maps calculatedon the output of Plby 
overlapping 5x5 filter, the size of output is (48-5+1=44), we obtained 16 feature maps of 44x44. At layer P4, 
there are 16 feature maps of 22x22(44/2=22) computing by max-pooling on the output of C3 by non- 
overlapping 2x2 filter. The convolutional layer C5, we obtained 32 feature maps of 20x20 calculated by 
overlapping 5x5 kernel with padding=1 on the output of P4, the output size is ((22-5+2x1)+1=20). The last 
max-pooling layer P6, we have 32 feature maps of size 10x10(20/2=10) computing on the output of the C5 
by non-overlapping 2x2 filter. Finally, we obtained (10x10x32=3,200) 3,200 features are the size of the input 
FC7fully connected layer. The output of FC7 contains 96 classes which composed using softmax classifier to 
produce 96 output classes Figure 4. 


Max-pooling layer Max-pooling layer Max-pooling layer 
48x 48x8 22x 22x16 10x 10x 32 
P2 P4 P6 \ 
Convolutional layer Convolutional layer Convolutional layer Fully connected layer 
100 x 100 À 
i 96x 96x8 44x44x16 20 x 20x32 
Input image 
C1 c3 c5 FC7 


Figure 4. CNN proposed for Arabic handwritten words recognition 


3. RESULTS AND DISCUSSION 
3.1. Dataset 

A big training data of handwritten words images is needed from the convolutional neural network 
for obtaining better yield. For that, a dataset of Arabic handwritten words was collected and made using a 
benchmarking database, Arabic handwriting database (AHDB). This database included words images 
produced by writing legal amounts on Arabic checks and Arabic handwritten pages of 100 scripters [36], it’s 
available in (http://handwriting.qu.edu.qa/dataset/), and contains 105 forms of Arabic handwritten words 
composed from 96 class of words, thus, the total image is 10,080 was not enough for an input of 
convolutional neural network to obtain a good result. Therefore, these images were used to collect other 
images. It was aimed to solve that problem, thanks to the pre-processing such as rotation transformation by 
two ways to make new images. Afterwards, we obtained a total image of words in each class 420 word 
images i.e. 40,320 images, and, the 96 class of database was divided into two data: a training data (26,880 
words: 280 images per class) and a test data (13,440 word images: 140 images per class). 


3.2. Experiments and results 

As part of this work, a method of Arabic handwritten words recognition has proposedusing 
convolutional neural networks aiming to transform handwriting word images into their symbolic 
representations. The programming language used is MATLAB 2018a, the programwereimplemented in 
MATLAB 2018a CUDA SDK v.7.5, 1.70 GHz Core i5 PC with GPU NVIDIA GeForce GT 635M and 6G 
memory performed on windowsystem. We apply our method onthe 13,440 test word images (140 images for 
each class) to recognize the Arabic handwritten words. The result of our method was rated by computing the 
success rate of recognition of the obtained result, also we applied the method on 40,320 images of 
handdwriting Arabic words written by various scripter, divided into two sets : training sets 26,880 word 
images (280 for each class) and test sets 13,440 word images (140 for each class). Our algorithm is runningwith 
8 epochs, but CNN start to decrease error of miss-classification from epoch 6 Figure 5(a). The result of our 
system is very promising, since we could achieve a successful recognition rate of 96.76% Figure 5(b). 

A comparison was made between the results of our method with previously published methods of 
Arabic handwriting words recognition. Table 1 exhibits the successfulrecognition rates of previous work of 
Arabic handwritten words recognition. Clearly, our proposed method of Arabic handwritten word recognition 
by convolutional neural networks using AHDB database is the best one. 
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Figure 5. Recognition and miss-classification rate of Arabic handwritten words: (a) recognition rate and 
(b) miss-classification rate 


Table 1. Word extraction rates 


Method Classifier Data Rate 
Alkhoury [9] CNN, SVM HACDB and IFN/ENIT 94.17% for HACDB, 92.95% for 
INF/ENIT 
Elleuch et al. [1] SVM HACDB 91.14% 
Jayech et al. [2] MSHMM IFN/ENIT 91.10%(set a) 
Tamen et al. [31] HMM/MLP IFN/ENIT 89.03% 
Kessentini et al. [10] Multilayer perceptron, SVM and ELM IFN/ENIT 96.82% 
Alkhateeb et al. [11] | CNN based HMM IFN/ENIT 89.23% 
Amrouch et al. [32] SVM AHDB 99.08% 
Lamsaf [33] k-nearest-neighbors (KNN) AHDB 86.7% 
AWNI [12] deep convolution neural networks AlexU-W and IFN/ENIT 96.11%, 
Proposed method Convolutional neural network (CNN) AHDB Database 96.76% 


4. CONCLUSION 

An Arabic handwritten word Recognition is an active field in research that still needs to improve its 
performance. In the present paper, a method of Arabic handwritten words recognition has been proposed 
using convolutional neural network (CNN), in order to use new technologies in pattern recognition. The 
convolutional neural network is suggested for recognizing Arabic handwritten words; comparing to other 
system of deep learning, CNN gives the best result in big data of image proccessing field. There isn't a big 
database of Arabic handwritten words to use for CNN system, we proposed to collect a new database from 
the benchmarking Arabic handwriting database (AHDB) using the pre-processing, we apply the 
transformation of rotation on the images of the database to create new images with different features. The 
method is applied on the benchmarking databse AHDB and reachesthe best result. 
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